Walking Fingerprinting Using Wrist Accelerometry during Activities of Daily Living in NHANES

Lily Koffman

Department of Biostatistics, Johns Hopkins School of Public Health

Introduction: accelerometry data

Introduction: accelerometry data

Introduction: big accelerometry data

Can we identify someone from their walking pattern measured by a wrist-worn accelerometer?

Problem setup

Problem setup

Problem setup

Big picture method: time series to scalar predictors

Fingerprints summarize predictors for a given lag and are different across individuals

Fingerprints summarize predictors for a given lag and are different across individuals

Model fitting

Results in labeled datasets

32 individuals, 6 minutes of walking each

100% rank-1 accuracy (Koffman et al. 2023)

153 individuals, 3 minutes of walking each

Two sessions, at least 1 week apart

Rank-1 (rank-5) % accuracies

  • Train and test on session 1
    • Logistic regression: 92 (97)
    • XGBoost: 93 (99)
  • Train on session 1, test on session 2
    • Logistic regression: 41 (75)
    • XGBoost: 58 (78)

(Koffman, Crainiceanu, and Leroux 2024)

Can we identify someone from their walking pattern measured by a wrist-worn accelerometer in large, unlabeled data sets?

NHANES data: recap

  • \(>15,000\) participants
  • \(7\) days of wrist accelerometry
  • \(10\)Tb of data
  • Open source processing pipeline
  • Publicly available data repository
  • First nationally representative estimate of steps in the US population using open source algorithms
Koffman and Muschelli (2025)

Walking fingerprinting in NHANES

Process outline

  • Use algorithm to identify walking
  • Partition data into train/test
  • Fit models

Use algorithms to identify walking


ADaptive Empirical Pattern Transformation (ADEPT) (Karas et al. 2019)

library(adept)
adept::segmentWalking(
  xyz, # data frame of tri-axial accelerometry (3 cols)
  xyz.fs = 100, # sample rate 
  template = templates # list of templates for pattern matching
)



stepcount (Small et al. 2024)

devtools::install_github("jhuwit/stepcount")
library(stepcount)
stepcount::stepcount(file = sample_data, model_type = "ssl")

Use algorithms to identify walking

Use algorithms to identify walking

Use algorithms to identify walking

Partition data into train/test

Fit models

Questions

  • How important is sample size?
    • Fit on subgroups of size \(n=100\) and entire \(N\)
  • How important is walking identification algorithm accuracy?
    • Compare ADEPT, stepcount on same individuals
  • How important is length of training data?
    • Train on 3 min vs. 30 min.
  • How important is train/test partition type?
    • Compare random vs. temporal
  • How important is model choice?
    • Compare logistic regression, XGBoost, random forest
  • Can we improve logistic regression?
    • Weighting, oversampling

Sample size

Algorithm

Length of data

Train/test partition

Model choice

Model improvements

Dataset Model Rank 1 Rank 1% Rank 5 Rank 5%
Random
(n=13,367)
Logistic 9.7 68 21 93
Oversampled at 10% 41 68 95 99
Weighted 34 96 61 100
Two-stage 20 68 37 93
Temporal
(n=10,770)
Logistic 0.028 26 5.1 49
Oversampled at 10% 4.3 32 10 52
Weighted 1.8 23 5.1 45
Two-stage 5.2 26 10 49
Rank 1, rank 1%, rank 5, rank 5% accuracies of different model types on the entire population for each model. The best model in each category is bolded.

Fingerprints

Fingerprints

Summary


  • We can identify individuals from their walking patterns in large, unlabeled datasets using predictors derived from acceleration, lag acceleration
  • Performance depends on sample size, walking identification algorithm, train/test partition, length of training data, model choice
  • Preprint on arXiv (Koffman, Muschelli, and Crainiceanu 2025)

Future directions


  • Using CNN/deep learning models
  • Using fingerprint as the outcome instead of predictor, investigating change over time or association with health status/mortality

Thank you!



References

Karas, Marta, Marcin Stra̧czkiewicz, William Fadel, Jaroslaw Harezlak, Ciprian M Crainiceanu, and Jacek K Urbanek. 2019. “Adaptive Empirical Pattern Transformation (ADEPT) with Application to Walking Stride Segmentation.” Biostatistics 22 (2): 331–47. https://doi.org/10.1093/biostatistics/kxz033.
Koffman, Lily, Ciprian Crainiceanu, and Andrew Leroux. 2024. “Walking Fingerprinting.” Journal of the Royal Statistical Society Series C: Applied Statistics 73 (5): 1221–41. https://doi.org/10.1093/jrsssc/qlae033.
Koffman, Lily, and John Muschelli. 2025. Minute level step counts and physical activity data from the National Health and Nutrition Examination Survey (NHANES) 2011-2014.” PhysioNet. https://doi.org/10.13026/9N0R-TV02.
Koffman, Lily, John Muschelli, and Ciprian Crainiceanu. 2025. Walking Fingerprinting Using Wrist Accelerometry During Activities of Daily Living in NHANES.” https://arxiv.org/abs/2506.17160.
Koffman, Lily, Yan Zhang, Jaroslaw Harezlak, Ciprian Crainiceanu, and Andrew Leroux. 2023. “Fingerprinting Walking Using Wrist-Worn Accelerometers.” Gait & Posture 103 (June): 92–98. https://doi.org/10.1016/j.gaitpost.2023.05.001.
Small, Scott R, Shing Chan, Rosemary Walmsley, Lennart von Fritsch, Aidan Acquah, Gert Mertes, Benjamin G Feakins, et al. 2024. “Self-Supervised Machine Learning to Characterize Step Counts from Wrist-Worn Accelerometers in the UK Biobank.” Medicine and Science in Sports and Exercise 56 (10): 1945.